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LOW-RANK APPROXIMATION OF TENSORS VIA SPARSE 

OPTIMIZATION 

XIAOFEI WANG* AND CARMELIZA NAVASCAt 


Abstract. The goal of this paper is to find a low-rank approximation for a given tensor. Specifi¬ 
cally, we give a computable strategy on calculating the rank of a given tensor, based on approximating 
the solution to an NP-hard problem. In this paper, we formulate a sparse optimization problem via 
an /i-regularization to find a low-rank approximation of tensors. To solve this sparse optimization 
problem, we propose a rescaling algorithm of the proximal alternating minimization and study the 
theoretical convergence of this algorithm. Furthermore, we discuss the probabilistic consistency of the 
sparsity result and suggest a way to choose the regularization parameter for practical computation. 
In the simulation experiments, the performance of our algorithm supports that our method provides 
an efficient estimate on the number of rank-one tensor components in a given tensor. Moreover, this 
algorithm is also applied to surveillance videos for low-rank approximation. 

Key words, /i-regularization, low-rank approximation, proximal alternating minimization, 
sparsity 

AMS subject classifications. 15A69, 65F30 

1. Introduction. We have seen the success of the matrix SVD for several decades. 
However in the advent of modern and massive datasets, even SVD has its limitation. 
Since tensors have been known to be a natural representation of higher-order and 
hierarchical dimensional datasets, we focus on the extension of low rank matrix ap¬ 
proximation to tensors. Tensors have received much attention in recent years in the 
areas of signal processing [11, 37, 38], computer vision [20, 30, 35, 42], neuroscience 
[2, 31], data science and machine learning [24, 42, 20, 1]. Most of these applications 
rely on decomposing a tensor data into its low-rank form to be able to perform ef- 
hcient computing as well as to reduce memory requirements. This type of tensor 
decomposition into a sum of rank-one tensor terms is called the canonical polyadic 
(CP) decomposition; thus, it is viewed as a generalization of the matrix SVD. The 
generalization of matrix SVD to tensors is not unique. Another tensor decomposition 
is called the Higher-Order SVD [13, 24], which is a product of orthogonal matrices 
with a dense core tensor. Higher-order SVD is considered another extension of the 
matrix SVD. 

Unlike the matrix case where the low-rank matrix approximation is afforded by 
truncating away small rank-one matrix terms [17], discarding negligible rank-one 
tensor terms does not necessarily provide the best low-rank tensor approximation 
[25]. Moreover, most low rank tensor algorithms do not provide an estimation on 
the tensor rank; an a priori tensor rank is often required to find the decomposition. 
Several theoretical results [26, 27] on tensor rank can help, but they are limited to 
low-multidimensional and low order tensors so they are inapplicable to tensors in 
real-life applications. In fact, for real dataset, tensor rank is important. In a source 
apportionment data problem [28], the tensor rank of the data provides the number 
of pollution source profiles to be identified. In this work, the focus is on finding an 
estimation of the tensor rank and its rank-one tensor decomposition (CP) of a given 
tensor. There are several numerical techniques [11, 15, 24, 33, 37] for approximating 
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a fcth rank tensor into its CP decomposition, but they do not give an approximation 
of the minimum rank. There are algorithms [12, 5] which give tensor rank, but they 
are specific to symmetric tensor decomposition over the complex field using algebraic 
geometry tools. 

Our proposed algorithm addresses two difficult problems for the CP decomposi¬ 
tion: (a) one is that finding the rank of tensors is a NP-hard problem [21], and (b) 
the other is that tensors can be ill-posed [14] and failed to have their best low-rank 
approximations. 

The tensor rank problem is formulated as an Iq minimization problem; i.e. 

(1.1) min||Q:|[o subject to = [a; X, Y, Z]^ 


R 

where ^ o o = [q:;X, Y, Z]^ represents a sum of the outer products of 

r—1 

the vectors Xr,yr, and z^ for r = l,...,i?. Here ||q:||o corresponds the number of 
nonzero coefficients in the sum. However, this problem formulation (1.1) is NP-hard. 
Inspired by the techniques in compressive sensing [7, 10, 18], we then consider an 
Zi-regularization formulation for tensor rank, 

(1.2) min||a||i subject to ^ = [a; X, Y, Zj/j. 

a 

Here we denote |[q:||i = I®*!- It is well known in the compressed sensing com¬ 

munity that minimizing the £i norm of the vector a recovers the sparse solution of 
the linear system. In the presence of noise, the constraint, A = [q:;X, Y, ZJ/j is 
replaced with ||„4 — [a; X, Y, Zj/jl];’ < e where || • ||f is the Frobenius norm with 
II-4 ||f = Moreover, to achieve a tensor decomposition as well as tensor 

rank, we minimize over the factor matrices, X,Y, and Z, thus this minimization 
problem is considered: 

(1.3) min iM-[a;X,Y,Z]«||| + A||a||i. 

X, Y .Z,q; Z 

The .^i-regularization achieves a good approximation of tensor rank due to the spar¬ 
sity structure and its tractability. In addition, the /i-regularization term provides a 
restriction on the boundedness of the variables thereby ameliorating the ill-posedness 
of the best low-rank approximation of tensors. For more tractable computing, an 
alternative multi-block constraint optimization [4] is implemented, which is similar to 
the technique discussed in [42]. Since (1.3) is a minimization of a sum of a smooth 
term and a nonsmooth term, we consider the following optimization problem with 
smooth and non smooth terms: 

(1.4) = argmin{/(x'=) + (x - x^ Vx/(x'=)) + ^|[x - x'^H^ + g(x)} 

X z 

where / and g are the smooth and nonsmooth functions, respectively. Here / is 
approximated at a given point x^. 

1.1. Contributions. Here we list our contributions in this paper: 

1. We develop an iterative technique for tensor rank approximation given that 
the main objective function contains a nonsmooth Zi-regularization term. The 
proximal alternating minimization technique [4, 42] has been adapted and 
rescaled for our tensor rank minimization problem. 
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2. We provide some theoretical results on the convergence of our algorithm. 
We show the objective function satisfies a descent property in Lemma 4.3 
and a subdifferential lower bound [4]. A monotonically decreasing objective 
function is ensured on the sequence generated by the algorithm. Furthermore, 
we point out that the sequence generated by algorithm converges to a critical 
point of the objective function with indicator functions on the normalization 
constraint that all the columns of the factor matrices have length one. 

3. For practical implementation, we provide a technique (as well as theoretical 
results) to find a suitable choice on the regularization parameter directly from 
the data. The regularization parameter choice has remained a very challeng¬ 
ing problem [34, 36, 23, 32] in applied inverse problems. Our technique is 
based on the probabilistic consistency of the sparsity in the classical model 
found in [41, 43]: 


b = B6»*+e, 


where 9* is a sparse signal, B is a design matrix and e is a vector of indepen¬ 
dent subgaussian entries with mean zero and parameter a^. We show that 
to find the true sparsity structure with a high probability, the regularization 
parameter relies on two intrinsic parameters and 7 of models, where 
represents the variance of noise, and 7 is the incoherence parameter [41] on 
design matrix B. The relationship between the regularization and intrinsic 
parameters actually provides us a suggestion on how to choose a reasonable 
regularization parameter for practical computation. To illustrate the perfor¬ 
mance of this low-rank approximation method, our experiment consists of 
four parts. In the first part, we show the relationship between the regulariza¬ 
tion parameter and the estimated rank. In the second part, we estimate the 
number of rank-one components for given tensors by adaptively selecting the 
regularization parameter A. In the third one, we compare our algorithm with 
a modified alternating least-squares algorithm. In the last one, we handle the 
real surveillance video data. 

1.2. Organization. Our paper is organized as follows. In Section 2, we pro¬ 
vide some notations and terminologies used throughout this paper. In Section 3, we 
formulate an li-regularization optimization to the low-rank approximation of tensors. 
In Section 4, we propose an algorithm to solve this /i-regularization optimization by 
using a rescaling version of the proximal alternating minimization technique. In Sec¬ 
tion 5 we discuss the probabilistic consistency of the sparse optimal solution and give 
a suggestion on how to choose the regularization parameter. The numerical experi¬ 
ments in Section 6 consist of simulated and real datasets. Finally, our conclusion and 
future work are given in Section 7. 

2. Notation. We denote a vector by a bold lower-case letter a. The bold upper¬ 
case letter A represents a matrix and the symbol of tensor is a calligraphic letter A. 
Throughout this paper, we focus on third-order tensors A = (flyfe) £ qJ three 

indices !<*</, l<j<J and 1 < k < K, but all the methods proposed here can 
be also applied to tensors of arbitrary high order. 

A third-order tensor A has column, row and tube fibers, which are defined by 
fixing every index but one and denoted by a-jk, sa-.k and a^, respectively. Cor¬ 
respondingly, we can obtain three kinds Ajij, A( 2 ) and A( 3 ) of matricization of A 
according to respectively arranging the column, row, and tube fibers to be columns of 
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matrices. We can also consider the vectorization for A to obtain a row vector a such 
the elements of A are arranged according to k varying faster than j and j varying 
faster than i, i.e., a = (am, • • • , anK, 0 , 121 , ■■ ■ , 012 K, ■■■ , oiji, • • • , oijjf, • • • )■ 

The outer product xoyoz G of three nonzero vectors x,y and z is a 

rank-one tensor with elements xtyjZk for all the indices. A canonical polyadic (CP) 
decomposition of .4 G expresses A as a sum of rank-one outer products: 

R 

(2.1) A = ^XrOy^oz^ 

r—1 

where x^ G R'^,yr G ,Zr G for 1 < r < R. Every outer product x^ o y^ o 
Zr is called as a rank-one component and the integer R is the number of rank-one 
components in tensor A. The minimal number R such that the decomposition (2.1) 
holds is the rank of tensor A, which is denoted by rank(yl). For any tensor A G 
^ixJxK^ rank(yl) has an upper bound min{/J, JAT,/AT} [26]. 

The CP decomposition (2.1) can be also written as: 

R 

(2.2) A = E ar^r O o 

r—1 

where € M is a rescaling coefficient of rank-one tensor o y,. o z^^ for r = 1, • • • , i?. 
For convenience, we let a = (ai,*** G and [Q:;X,Y,Z]i^ = ° 

YrOZr in (2.2) where X = (xi,-- - ,xr) G = (yi,--- ,yfi) G and 

Z = (zi,-- - ,zii) G R^^^ are called the factor matrices of tensor A. We impose 
a normalization constraint on factor matrices such that each column is normalized 
to length one [24, 39] which is denoted by N(X,Y, Z) = 1. For most alternating 
optimization algorithms for tensors, flattening the tensor (matricization) is necessary 
to be able to break down the problem into several subproblems. Here we describe 
a standard approach for a matricizing of a tensor. The Khatri-Rao product of two 
matrices X G and Y G is defined as 

X© Y = (xi ©yi,-- - ,XRiSiyR) G 

where the symbol “©” denotes the Kronecker product: 

X © y = {xiyi, ■■■ , xiyj, • • • , xiyi, ■■■ , xjyj)'^. 

Using the Khatri-Rao product, the decomposition (2.2) can be written in three dif¬ 
ferent matrix forms of tensor A [6]: 

(2.3) A(i) = XD(Z © Y)^, A(2) = YD(Z © X)^, A(3) = ZD(Y © X)^ 

where the matrix D is diagonal with elements of a. 

3. Sparse optimization for low-rank approximation. The main goal of this 
study is to find a tensor of low-rank of the original tensor efficiently and accurately. 
We first formulate a tensor rank optimization problem: 

nnnrank(H) subject to \\A — B\\'^p<e. 

For any given error e, the minimal rank of B such that ||A — B\\f < £ is no larger 
than rank(A). The optimal solution H is a low-rank approximation of A with error e. 
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R 

We represent the tensor B as o o z^. = [a;X, Y, ZJ/j where i? is a 

r—1 

upper bound of the rank of A and columns of X, Y, Z satisfy the normalization con¬ 
straint N(X, Y, Z) = 1. Rescaling the columns of the matrices X, Y, Z is a standard 
technique [39, 1]. It is implemented in practice for canonical polyadic tensor decom¬ 
position to prevent the norm of the approximated matrices blowing up to infinity 
while another factor matrix tend to zero while keeping the residual small. 

The tensor rank minimization is equivalent to the following constraint optimiza¬ 
tion problem with Iq-hottii: 

(3.1) min||a||o s.t. M - [a; X, Y, Z]^i|||, < e, N(X, Y, Z) = 1 

Ot 

The problem (3.1) is equivalent to that of finding the rank of tensors when e = 0, 
whose decision version is NP-hard [21]. 

To make it more tractable, we turn to an optimization problem with Zi-norm: 

(3.2) min||a||i s.t. ||^ - [a; X, Y, Z]jj|[^ < e, N(X, Y, Z) = 1 
Furthermore, we then solve: 

(3.3) min i|[^ - [a; X, Y, Z]«|[^ + A|[a|[i s.t. N(X, Y, Z) = 1, 

X. Y .Zi.OL Z 

an Zi-regularization optimization problem in which it includes the factor matrices as 
primal variables. These optimization formulations are common in compressed sensing 
[9, 16, 8, 7, 10, 18]. By introducing the indicator function, we switch the constrained 
optimization problem (3.3) into the following unconstrained form: 

(3.4) min - [a; X, Y, Z]«|[^ + A||a||i + (X) + (Y) + <553(Z) 

X. Y .Z.CK Z 

where Si = {X|||xr|| = l,r = 1, • • • , i?}, 5*2 = {Y|||y^|| = l,r = I,-- - ,i?} and 
53 = {Z|||z,|| = l,r = l,...,i?}. 

Remark 3.1. Here there is no simple manner to compute the relationship between 
£ and A without already knowing the optimal solutions of formulations (3.2) and (3.3). 
In the matrix versions of Basis Pursuit: 

min ||0||i, s.t. ||b — B0|[ < e 
0 


and 

mmi||b-B6/||2-HA||6/||i, 

it is possible to create a mapping between the two parameters through a Pareto curve 
to estimate the relationship from the support of few solutions [4-0]. 

Our algorithm is tailored for solving the problem (3.4). Let the objective function 
in (3.4) as 

4'(X, Y, Z,q:) : X x R^^-« x R^ ^ R+, 


where 


vI/(X, Y, Z,a) = /(X, Y, Z,a) + g{a) + Js, (X) + <5^3 (Y) + ^53 (Z) 


6 


LOW-RANK APPROXIMATION OF TENSORS 


with the approximation term /(X, Y, Z,q:) = i||^ — [a; X, Y, Z the regularized 
penalty term g{cy.) = A||a||i and three indicator functions 5sj(X), (552(Y), Ss^CZ). 
The function /(•) is a real polynomial function on (X, Y, Z,q:) and the function 
g{») is a non-differential continuous function on a. Since 81 , 82,33 are closed sets, 
indicator functions 5sj(X), (5s2(Y) and Ss^iZ) are proper and lower semicontinuous. 
Moreover, since 5sj(X), i5s2(Y) and i5s3(Z) are three semi-algebraic functions, thus 
the objective function is also a semi-algebraic function. So it is a Kurdyka-Lojasiewicz 
(KL) function [4], For a point lj = (X, Y, Z,a) e x x x R^, if its 

(limiting) subdifferential [4], denoted by contains 0, we call it a critical point 

of The set of critical points of 4'(*) is denoted by Cij.. 

Due to ill-posedness [14, 29] of the best low-rank approximation of tensors, it is 
known that the problem of finding a best rank-R approximation for tensors of order 
3 or higher has no solution in general. However, after introducing the li penalty 
term A||q;||i to the low-rank approximation term /(•), it is always attainable for the 
minimization of the objective function in (3.4). Thus we have the following theorem 
to show the existence of the global optimal solution of problem (3.4). 

Theorem 3.1. The global optimal solution of problem (S.f) exists. 

Proof For any tensor M the objective function ^||M—[a; X, Y, Z]fl;|||,+ 

A||a||i + (5si(X) + (5 s2 (Y) + (5s3(Z) is denoted as 4 '(X,Y,Z,q:). Notice that all 
the columns of X, Y, Z in problem (3.4) are constrained to have length one. We 
define the d—dimensional unit sphere as A'^ = {v G R‘^|||v ||2 = 1}, and a set 
8 = {(X, Y, Z,q:) G (A^)^ X (A*^)^ X {A^)^ x R^}. Since this function 4'(*) 
is continuous on 8, we only need to show that there is a point s € 8 such that 
'I'(s) = inf{\E'(a;)|a; G 8}, i.e., the minimization of low-rank approximation with li 
penalty is attainable. 

For a scalar f > inf{\E'(a;)|a; G 8}, we will show that the level set L = {x G 
S'|4>(a;) < is compact. Since '!'(•) is continuous on 8, the set L is closed and we 
only need to prove that L is bounded. Actually, it is guaranteed by the h penalty 
term A||q;||i of '!'(•). Otherwise, unbounded points will take the penalty term go to 
inhnity contrary to the boundedness of '!'(•) on L. From the compactness of the level 
set L, the infimum inf{4'(x)|a; G L} is attainable because '!>(•) is continuous on L. 
Furthermore, since inf{4'(a;)|a; G 8} = inf{'l>(a;)|x G L}, there exists a point s G 8 
such that 'l>(s) = inf{4'(x)|a; G 8}. □ 

4. Low-rank approximation of tensor. In this section, we first describe an 
algorithm (LRAT) of low-rank approximation of tensor for computing the solution 
of problem (3.4), and then show some theoretical guarantees on the convergence of 
LRAT: (1) The sequence {(X”, Y", Z", a")}„gN generated by LRAT converges to a 
critical point of 4'(*). (2) The limit point of {(X", Y", Z", a")}„gN is a KKT point 
of problem (3.3). 

4.1. The algorithm. As in (2.3), the matricizations of tensor B = [a; X, Y, Z]ii 
via Khatri-Rao products are 

B(i) = XD(Z © Y)^, B(2) = YD(Z 0 X)^, B(3) = ZD(Y 0 X)^ 

where D = diag{ai, ■ ■ ■ , an). We introduce the following three matrices for updating 
in the Algorithm 1: 

(4.1) U = D(Z© Y)^,V = D(Z©X)^,W = D(Y0X)^. 

It follows that B( 3 ) = XU, B( 2 ) = YV and B( 3 ) = ZW. Thus the function 
/(X, Y, Z,a) can be written in three equivalent forms: 2 ||A(i) — XU III. = ^I|A(2)- 
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Algorithm 1 Low-Rank Approximation Of Tensors (LRAT) 

Input: A third order tensor A, an upper bound i? of rank(A), a penalty parameter 
A and a scale s > 1; 

Output: An approximated tensor B with an estimated rank i?; 

1 : Give an initial tensor B^ = [a°; X°, Y°, Z°]^. 

2 : Update step: 

b. Update matrices X,Y, Z: 

Compute U” by (4.1) and let dn = max{||U"U”^||p, 1}. 

Compute D" and X"+i by 

= X” - J-Vx/(X”, Y”, Z", 

Sdn 

X"+i=D"diag(||d(^||,--- ,||dS||)-i 

where d" is the i-th column of D" for i = 1, • • • , A. 

Compute V” by (4.1) and let e„ = maxlll1}. 

Compute E” and Y"+i by 

E” = Y" - — Vy/(X"+\Y",Z",q:"), 

SCn 

Y"+i=E"diag(||e-||,... ,||e^||)-i 


where e" is the i-th column of E" for i = 1, • • • ,R. 
Compute W" by (4.1) and let /„ = max{||W”W"^||F, 1}. 
Compute F” and by 

F" = Z" - — Vz/(X”+\ Y”+\Z",a”), 

sfn 

Z"+i=F"diag(||fr||,... ,||f;)||)-i 


where f" is the i-th column of F" for i = 1 , • • • ,R. 
c. Update the row vector a: 

Compute by (4.3) and let r/n = maxlHQ^+^Q^+^^IlF, !}■ 

Compute by (4.5) and use the soft thresholding: 

a"+i =5^(/3"+i). 


3; Denote the limitations by X,Y,Z,q;, compute B = [Q:;X,Y,Z]fl and count the 
number R of nonzero entries in a. 

4: return The tensor B with the estimated rank R. 


YV|||. = ^||A(3) — ZW|||.. Furthermore, we have the gradients of /(•) on X, Y, Z: 

Vx/(X, Y, Z,a) = (XU - A(i))U^, 

(4.2) VY/(X,Y,Z,a) = (YV-A(2 ))V^, 

Vz/(X, Y, Z,a) = (ZW - A(3))W^. 


Using the vectorization of tensors [19], we can vectorize every rank-one tensor of 
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outer product x,, o o z^. into a row vector for 1 < r < i?. We denote a matrix 
consisting of all for 1 < r < i? by 

(4.3) Q^(qf,...,qS)^. 

Thus the function /(X,Y, Z,q:) can be also written as ^||a— aQ|||,, where a is a 
vectorization for tensor A. Furthermore, the gradient of /(•) on a is 

(4.4) Vc«/(X,Y,Z,a) = (aQ-a)Q^. 

Our algorithm starts from (X^, Y^', Z^, a^) and iteratively update variables X, Y, Z 
and then a in each loop. Inspired by the equation (1.4), the update of X is based on 
the following constraint optimization problem: 

C(j 

argmm{(X - X", Vx/(X", Y", Z", oA)) + ^||X - X"|||} 
s.t. ||Xi|| = l,i = 1,... ,i?, 

where X = (xi,... ,Xfl) S dn = max{||U"U”^||i:’, 1} and U" is computed 

from a”, Y”, Z" by (4.1). This problem is equivalent to: 

argnnn{||X - D"|||.} s.t. ||xj|| = l,i= I,--- ,R- 
where D” = X" — .j^Vx/(X"', Y", Z", a"). So we obtain the update of X: 

xr+i = dr/iidrii,* = i,--- ,i?, 

where x"“''^ and d" are the i-th columns of X"+^ and D". 

Similarly, the update of Y is based on the following optimization problem: 

argnnn{(Y - Y", Vy/(X"+i, Y", Z", a")) + ^||Y - Y^} 
s.t. ||y,|| = l,i= I,--- ,R, 


where Y = (yi,--- ,yfl) G e„ = max{||V”V”^|| f, 1} and V" is computed 

from a", X"+^, Z" by (4.1). So we obtain the update of Y: 

yr+i=er/||er||,* = l,---,i?, 

where y"+^ ande^ are the i-th columns ofY’"+i andE" = Y"-^Vy/(X”+i, Y”, Z", a’"). 
The update of Z is based on the following constraint optimization problem: 

argnnn{(Z- Z",Vz/(X"+i, Y"+i,Z",a’^)) + f|l||Z - Z"|||.} 
s.t. ||zi|| = 1,1= 1,... ,R, 

where Z = (zi,... , z/{) G /„ = max{|| W”W”^|lir, 1} and W” is computed 

from a",X’"+i, Y"+i by (4.1). The update of Z is: 


where and are the i-th columns of Z”"*"^ and F” = Z” 


^Vz/(X"+i,Y"+i,Z'*,a"). 
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Finally, we consider to update ot by using the equation (1.4): 
argmin{(a - V„/(C"+\ X"+\ Y"+\ Z"+i, a")) + ^||a - a"f + A||a||i}. 

OL I 


where ? 7 „ = max{||Q"+^Q”+^^||i 7 ’, 1 } and can be computed from Z"+^ 

by (4.3). This optimization problem is equivalent to: 


argnnn — ||Q: — csl 
OL 2 


+-V„/(C"+\X'*+\Y"+\Z"+\ 


srin 


2^ 

sr]n 


a 1- 


So we can obtain the update form for ct in Algorithm 1 by using the separate soft 
thresholding: 


a’^+i =5^(/3"+i) 

ST7n 

where 

(4.5) /3”+i = a" - — V„/(C"+\X’"+\Y"+\Z"+\q:"). 

srjn 

It should be noted that if we set A = 0, the LRAT algorithm turns into a mod¬ 
ified alternative least square method (modALS). This modified ALS algorithm uses 
linearized iterative technique [4, 42] to update variables in each step. Although the 
regularization parameter A is fixed in Algorithm 1, we can adaptively choose it for 
practical computation, which will be shown in the Section 5. 

Remark 4.1. In our algorithm, the computational complexity mainly comes from 
matrix multiplications. The Update Step (2h) for updating a in LRAT Algorithm 
require more cpu time than the Update Step (2a) because of the large matrix dimension 
of Q. The complexity of our algorithm is 0{NIJKR^), where N is the total number 
of iteration. 

4.2. Convergence of algorithm. In this subsection, we illustrate the conver¬ 
gence mechanism of the LRAT algorithm, which is a rescaling version of the proximal 
alternating linear minimization algorithm [4]. The following Lemma 4.1 points out 
that for the function /(cu) = |||Al— [a; X, Y, Z]/{|||, , the gradient Vcj/(a;) of f{co) 
is Lipschitz continuous on bounded subsets and all the partial gradients of /(u;) are 
globally Lipschitz with modulus. 

Lemma 4.1. Let f{uj) be the approximation term i||Al— [a; X, Y, Z]/j|||;. where 
u! = (X,Y, Z,q:). We have that the gradient function V/ is Lipschitz continuous 
on bounded subsets of x x x R-^, i.e., for any bounded subset 

B G R^^^ X R'^^^ X R*"^^ X R-^, there exists M > 0 such that for any iOi,uj 2 G B, 

||V^/(u;i) - V^/(a; 2 )||F < M\\u,i - W 2 ||f. 

Moreover, for any fixed X G R^^^, Y G R‘^^^,Z G G R^, there exist four 

constants c,d,e,r] > 0 such that: 

||Vx/(Xi,Y,Z,a) - Vx/(X2,Y,Z,a)||j. < d||Xi - X2||F,/or anyX,,^^ G R"^« 
||Vy/(X, Yi,Z,a) - Vy/(X, Y2,Z,a)||f < e||Yi - Y2||F,/or any Y,,Y2 G R^^^ 

II Vz/(X, Y, Zi, a) - Vz/(X, Y, Z2, a)||F < /||Zi - Z 2 ||F,/or any Zi, Z 2 G R^^« 
||Va/(X,Y, Z,q:i) - Va/(X,Y, Z,q:2)||f < T?||ai - a2||F,/or anyai,a2 G R^ 


10 


LOW-RANK APPROXIMATION OF TENSORS 


where d = ||UUT||p,e = ||VVT||^,/ = ||WWT||^,,y = ||QQT||^. 

The proof has not been included since it relies on standard techniques. In our LRAT 
algorithm, those Lipschitz constants rely on the iterative number n and have a lower 
bound 1. Specifically, dn = max{||U"+^U"'''‘^^|lF, 1}, e„ = max{|| 1}, 

/„ = max{||W"+iW"+i^||^, 1}, 7?„ = max{||Q"+iQ"+i^||^^, 1}. 

Lemma 4.2. (Sujficient decrease property [4]) Let f : K™ —^ K 6e a continuously 
differentiable with gradient V/ assumed Lj-Lipschitz continuous and let a : K™ —>■ 
(—oo,+oo] he a proper and lower semicontinuous function with infRm a > —oo. For 
any t > Lf and u € dom a, define 

u'^ = argmin{(a; — m, V/(m)) + ^\\x — u||^ + o'(u)}. 

X 2 

Then we have that 

(4.6) f{u) + a{u) - (/(m+) + (t{u+)) > ^(t - Lf)\\u+ - uf. 

Lemma 4.3. Let'!'(•) be the objective function in problem (3.4)- //(X”, Y”, Z", a" 
and {dm Cn, fmVn)nen o-fe generated by our LRAT algorithm, we have that for any 
s > 1 

4'(X", Y”,Z”,q;’") - 4'(X’"+\ Y’",Z",a”) > i(s - l)d„||X" - X”+^||^, 
4'(X"+\Y",Z",a") - T(X"+\Y’"+\Z",a”) > i(s - l)e„||Y” - Y”+^||^, 
4'(X"+\Y’"+\Z”,a”)- 4'(X"+\Y"+\Z”+\a’") > i(s - 1)/„||Z” - Z"+i^, 
4 '(X"+\Y"+\Z”+\q:’")- 4'(X’"+\Y’"+\Z"+\a"+i) > i(s - l)77„||a” - 

Proof. These four inequalities can be obtained by using Lemma 4.2. □ 

The following lemma shows that the value of 4'(*) monotonically decreases on 
the sequence (ci;")„gN, which is generated by our algorithm. 

Lemma 4.4. Let 'I'(a;) be the objective function 

i||A - [a; X, Y, Z]r\\% + A||a||i + 5s,(X) + 5s,(Y) + 5s3(Z) 
where us = (X, Y, Z, ol), then 

(i) the sequence {'I'(a;")}„gN is nonincreasing and for any n G N, there is a scalar 
f) > 0 such that 'I'(a;”) — 'I'(a;”+^) > /3||a;” — 

{ii) lim ||X" - X"+1 ||f -t 0, lim ||Y’" - Y”+i||i. 0, lim ||Z" - Z^+^\\f ->■ 0 and 

n—¥oo n—¥oo n—¥oo 

lim ||a" — —?► 0. 

n—¥oo 

(Hi) the sequence is bounded. 

Proof. In our algorithm, all the Lipschitz constants dn, en, fn,Vn > So by 
Lemma 4.3, T(w") - T(w"+i) > /3||w"where /3 = min{(s-I)/2,1/2}. We 
can obtain the first conclusion (i). 

OO 

The second conclusion {ii) holds from the first one because the sum (4'(ci;") — 

n—0 

4'(a;"+^)) is finite. 
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If the sequence {a;"}„gN is unbounded, it means that {Q:"}neN is unbounded 
since columns of X",Y",Z" are constrained to have length one. So the sequence 
is unbounded since 4 '(C, X, Y, Z,a) > A||q:||i. From the conclusion (i), 
is nonincreasing. Since '!'(•) has a lower bound, the sequence {\E'(u;„)}„gN is 
not unbounded. It is a contradiction. So the sequence {a;”}„gN must be bounded. □ 
Furthermore, from Lemma 4.1 and the boundness shown in Lemma 4.4, we can 
obtain the following Lipschitz upper bounds for subdifferentials. 

Lemma 4.5. Let u;" = (X”, Y”, Z”, a") be the sequence generated by our LRAT 
algorithm. There exist four positive scales Li,L2,L3 and L4 such that the following 
inequalities hold for any n G N. 

There is some 77” £ 9 x'I'(a;”) such that ||j7"||f < 

There is some iglf £ ^'I'(ci;") such that II172IIF < ^211^^" — 

There is some rjJ £ 5 z'I'(ci;") such that II173 ||f < 

There is some £ 9ci'I'(ci;") such that II174 ||f < L4\\lo'^ — 

Proof By the update of X, 

X" = argnnn{(X - X”-\ Vx/(C”, X"-\ Y"-\ Z”-\ a”"!)) 

+ f^||X-X'‘-i||| + ^S,(X)}. 

So we have that Vx/(X"-\ Y"-i, Z"*"!, a"-i) + s(i„(X" - X”"-!) + = 0 where 

u” £ 9 xi 5 si(X"). Hence 

< = sd„(X"-i - X") - Vx/(X”-\ Y"-\ Z”-\ 

Since Vx/(X”, Y", Z", a") + u" £ 9 x 4 '(X”, Y", Z”, a"), we have that 

=Vx/(X”, Y”, Z”, a”) + sd„(X”-i - X”) 
-Vx/(X"-i,Y"-i,Z"-i,a"-i) 

=Vx/(X”,Y",Z”,a”)+u” 

£ 5 x 4 '(X”,Y”,Z”,a”) 


By Lemma 4.1 and the boundness of {a;”}„gN, we have that there exists a constant 
Li such that ||» 7 "||f < Li\\uj'^ — u;”“^||f. 

Similarly, we can choose 

r,J=VY/(X",Y",Z",a") + se„(Y"-i-Y") 

(4-8) 1 1 1 

^ ' T-7 rfsrLi -xTri—l rrn—1 ^n—l\ 

- Vy/(X ,Y ,Z ,a ) 

and 

V3 =Vz/(X", Y", Z", a") + s/„(Z"-i - Z") 

- Vz,f(X",Y”,Z”-\Q:"-i) 


So qf £ ^4'(X", Y", Z", a") and q"^ £ 9z4'(X", Y", Z", a"). Furthermore, there 
exist constants L 2 and L 3 such that ||i 72 ||f < —‘^"~^| 1 f and Hrjg ||f < LgHu)" — 

u;"-1||f. 

By the update of a, 


(4.10) 


V„/(X", Y”, Z”, + S77„(a" - 


) + u” = 0 
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where u” € da.giot'^) and gfx) = A||x||i. Denote r/? as Va/(X”, Y”, Z”, a") + u”. 
Thus we have that rjJ e Y", Z", a") and 

IWIIf = ||V„/(X",Y",Z",a") + u"||f 

< ||V„/(X", Y’^,Z",a") - V„/(X", Y",Z”,a"-i)||^ + sg^\\a^ - a^-^y 
<L4\y^-uj^y\F 

We also get the last inequality by using Lemma 4.1 and the boundness of {a;"}„gN. □ 
The following theorem shows that the sequence of the LRAT algorithm is conver¬ 
gent to a critical point of 4'(#). 

Theorem 4.6. Let be a sequence generated by the LRAT algorithm 

from a starting point . Then the sequence {a;"}„gN converges to a critical point 
io* of'^y). 

Proof. By Lemma 4.3, the sufficient decrease property is satisfied that there is a 
constant /3 > 0 such that for any n € N 

/3||w" - < T(w") - 4'(u)"+i). 

By Lemma 4.5, the iterates gap has a lower bound by the length of a vector in 
the subdifferential of di*. There is a constant L > 0 and {? 7 "}„gN such that for any 
n € N, 

||r/"||E<i^||^"-^"-'||F 

where rj" £ 94'(a;"). 

Furthermore, since 4'(*) is a KL function, we complete the proof by using Theorem 
3.1 in [4]. □ 

A point (jO = (X, Y, Z, a.) is called as a KKT point of problem (3.3) if there are 
three diagonal matrices Hi, H 2 , H 3 £ and a vector u £ dag{ot) such that 

Vx/(u;) + XHi = 0, Vy/(^) + YH 2 = 0, VzfH + ZH 3 = 0 
’ V„/(u;)+u = 0,N(X,Y,Z) = l. 

In the following, we show that the limit point oj* = (C*, X*, Y*, Z*, a*) of the se¬ 
quence is a KKT point of problem (3.3). 

Corollary 4.7. Let a>* = (X*, Y*,Z*,q:*) be the limit point of the sequence 
generated by the LRAT algorithm. //X*, Y* and Z* has full column rank, 
the limit point u}* is a KKT point of problem (3.3). 

Proof N(X*,Y*,Z*) = 1 is obvious since N(X",Y",Z") = 1 and the conver- 
gency of From (4.10), there exists a vector u £ dag{a*) such that 

V„/(u;*)+u = 0 . 

By the update of X, there is a diagonal matrix H" such that 

Vx/(X”-\ Y"-\ Z"-\ q:"- 1) + sd„(X" - X"-^) -h X"H)( = 0. 

By the convergency of {t^"}ragN, we have that H” is convergent to some diagonal 
matrix H* since X* has full column rank. Furthermore, we can obtain 

Vx./(X*,Y*,Z*,q;*) -HX*Ht = 0. 

Similarly, we have 

VY/(X*,Y*,Z*,a*) + Y*H; = 0, Vz,/(X*,Y*,Z*,q:*) + Z*H* = 0. 

This completes the proof of this corollary. □ 
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5. Probabilistic consistency of the sparsity. In this section, we will discuss 
the probabilistic consistency of the sparsity of the optimal solution to problem (3.3). 
We will see that under a suitable choice on the regularization parameter, the optimal 
solution can recover the true sparsity in a statistical model with a high probability. 

For a given regularization parameter A > 0, an optimal solution to problem (3.3) 
is denoted by 

(X,Y,Z,a) = argmini||Al- [a;X,Y,Z]«||| + A||a||i s.t. N(X,Y,Z) = 1. 

X .Y .Z .ct ^ 

As shown in Section 4.1, we can construct a Rx {I*J*K) matrix Q = (q[, • • • , q]J)^ = 
((X 0 Y) 0 Z)^ from (4.3), and vectorize tensor A into a row vector a. 

For convenience, we introduce new variables: b, 0 , B for a^, cc^, respectively. 
Thus b and 0 are column vectors with dimension I * J * K and i?, and B is a 
{I * J * K) X R matrix. Furthermore, we have the following equality 

(5.1) - [a; X, Y, Z]«||| + ||a||i = i||b - B0\\l + A||0||i. 

The optimal solution for tensor approximation problem (3.3) is also an optimal 
solution 0 of a standard ^i-regularized least square problem 

(5.2) mmi||b-B0||2 + A||0||i. 

Assume that b and B have a sparse representation structure as 

(5.3) b = B6/*+£, 

where all the columns of B are normalized to one. The variable 0* is a sparse signal 
with k non-zero entries {k < R), and e is a vector with independent subgaussian 
entries of mean zero and parameter cr^. 

Denote a subgradient vector in d\\0 111 a,s (3 = (/3i,--- The entries of /3 

satisfy that for any 1 < i < R, Pi = sgn{9i) if 0^ 0 and Pi G [—1,1] if 0i = 0. As 

shown in the Lemma 1 of [41], 0 is an optimal solution to problem (5.2) if and only 
if there exists a subgradient vector /3 G i9||0||i such that 

(5.4) -B^(b-B0)-f A/9 = 0 

if and only if there exists a subgradient vector /3 G 9||0||i such that 

(5.5) B^B(0-0*)-B^e + A/3 = O. 

Assume that B is a full column rank matrix. Then the objective function in 
problem (5.2) is strictly convex, and the optimal solution 0 to problem (5.2) is unique 
and exact ci^. Denote S and S as the index sets of non-zero entries in 0* and 0 
respectively. So the sparse signal 0* can be rewritten as {0f,oT )^ and the cardinality 
of S is k. We will show in Theorem 5.1 that the optimal solution 0, which is also 
the oP", of problem (5.2) may become a suitable approximation for the real sparse 
signal 0*. Similar results shown in [41, 43] consider the case BPB/n —>■ C as n —>■ oo 
or n“^/^maxjg 5 o ||BjH < 1 where n is the number of rows in B, while in this paper 
all the Bj are normalized to one. We can further obtain a specihc probability bound 
shown in Theorem 5.1, which relies only on two intrinsic parameters of model. 
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According to the unknown set S, we can separate columns of the design matrix 
B as two parts (B 5 ,B 5 c), where S'^ is the complement of S. Moreover, since B 5 
also have full column rank, there exists a unique solution 63 by solving the restricted 
Lasso problem; 


(5.6) mini||b-Bs05||i-kA||6»s||i. 

If furthermore satisfies the equation (5.5), thus is the unique 

optimal solution 6 to problem (5.2) since B has full column rank. Moreover, we also 
obtain that the index set S' C S'. From (5.5), if Os satisfies two equations: 

(5.7) B^BsiOs - e*s) - B^e + X$s = 0 
and 

(5.8) B^aBs(0s-0S)-B^a£ + A3sc =0, 

where $s S 9||0s||i and ||/ 35 c||oo = max|/3j| < 1 , we have that 0 = ( 0 g,O’^)’^ 

satisfies the equation (5.5) and {ps,i^'sc)'^ £ Actually, since 63 minimizes 

the problem (5.6), there exists /3s G S||0s||i such that the equation (5.7) holds. So if it 
happens with a high probability that the equation (5.8) holds and ||/3sc||oo < 1, thus 
the event F = {( 0 ^, 0 ^ )^is the unique optimal solution 9 to problem(5.2)} happens 
with a high probability. Furthermore, the event {S C S} also happens with a high 
probability. We are going to show these in the the following part of this section. 
From equations (5.7) and (5.8), we have that: 

(5.9) /3so = B^^Bs(B^Bs)-i/3s + Bjo(I - Bs(B^Bs)-iB^)i, 

(5.10) 63 = 63-9*3 = {BlBs)-\Ble - X 0 s)- 

For any j G 5"^, we have that 

0 , = BjBs(B^Bs)-i/3s + Bj(I- Bs(B^Bs)-'B^)i = +a;,. 

We assume that there exists an incoherence parameter 7 G (0,1] such that 
||BgcBs(BgBs)“^||oo <1 — 7 , where matrix norm ||M||oo = max^|My|. It is 

® i 

easy to obtain = |BJBs(B^Bs)"^4sI < I|B|cBs(B^Bs)“^|| oo <1 — 7. And 
let us consider Wj = BJ(I — Bs{BgBs)~^Bg)j = y(ciei -!-■•• + c„e„), where 
(ci,--- ,c„) = Bj(I — Bs(BgBs)“^Bg). Thus ojj is a subgaussian distribution 
with zero mean and parameter x(ci + • • • + c^) = 72 Bj (I — Bs{BgBs)~^Bg)Bj. 
Since BjBj = 1, this parameter is no more than So Pr(max \ujj\ > t) < 

2{R — fc)exp(—^^), where k is the cardinality of S. By choosing t = ^ 7 , we have 
that Pr(max \uij \ > ^ 7 ) < 2(P — k) exp(—^^). Thus we have that 

(5.11) Pr(max \0j\ > 1 - ^) < Pr(max|wj| > ^ 7 ) < 2(P - fc) exp(-^-^). 
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Now let us consider about the upper bound of dg: ||^s||oo ^ ||(B^Bs)-ib^£|U + 
A||(BgB5)“^||oo- Since A||(BgB5)“^||oo has a fixed value, we only need to consider 

the first term. For any i G S,we have that u, = ef (BgBs)“^Bg£ = ciEi H-l-c„e„, 

where (ci,--- ,c„) = ef (BgB5)“^Bg. If we assume that Amm(BgBs) > /r, thus Vi 
is a subgaussian distribution with zero mean and parameter ^{cf + ■■■ + c^) = 
a'^ej < —■ Thus Pr(max|ui| > t) < 2fcexp(—By choosing t = 

"we have that 


(5.12) Pr(max|?;i| > < 2fcexp(-^) < 2feexp(-^^). 


By combining (5.11) and (5.12), we have the probability inequality Pr({max | > 

j&S‘^ 

1 — ^}lj{m^|ui| > 2^}) < 2i?exp(—-g^). Thus the probability inequality on the 
complementary set is that 


Pr({max |/3,| < 1 - < ^}) > 1 - 2i?exp(-^). 


Furthermore, we have that 


(5.13) Pr(rf|{||d5||oo < ^ + A||(B^Bs)-1||oo}) > 1 - 2Pexp(-^), 

where F = {{6g,0"^ )^is the unique optimal solution 6 to problem (5.2)}. 

From the above discussion, we obtain the following Theorem 5.1, which illustrates 
the probabilistic consistency of the optimal solution 6 to problem (5.2). 

Theorem 5.1. Suppose that the sparse structure (5.3) exists, the sparse signal 
9 * = )^ and B has full column rank. If there exist some parameters 7 

and fi where 0 < 7 < 1 and p, > 0 such that ||BgcB 5 (BgB 5 )“^||oo <1 — 7 and 
XminfB's^s) > fJ-, wc have that 

(5.14) Pr{{S C 5}f|{||d5||oo < ^ + A||(BiBs)-i||oo}) > 1 - 2Pexp(-^), 


where S is the index set of non-zero entries in 6, and Ss = 0s ~ and 0$ is the 
optimal solution of (5.6). Furthermore, if the lower bound of the absolute values of 
elements in 0s is larger than + ||(BgBs)“^||oo), we have that 


(5.15) 


Pri{S = S})>1- 2i?exp(-^). 


Proof. In terms of (5.13), the first inequality (5.14) follows from {S Q S} ^ F, 
where F = {(0g, 0^ )^is the unique optimal solution 0 to problem (5.2)}. 

If ||0s - ^slloo = ||ds||oo < 2 ^ + A||(B^B5 )“^||oo and the lower bound of the 
absolute values of elements in 0 g is larger than + A||(BgBs)“^||oo, it can be 

checked that the entries in 0s and 0s of the same index have the same sign. From 
(5.13), we can obtain the second inequality (5.15). 0 
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R 



Fig. 1. Trend of the estimated rank R. 


Theorem 5.1 tells us that if we want to recover the sparsity in (5.3) with a prob- 

\ 2 2 

ability p, we should choose a A such that 1 — 2i?exp(—-g^) > p when we know 
the intrinsic parameters 7 and cr^. So to adaptively give a regularization parame¬ 
ter A based on the data A, we need to give two guesses on the intrinsic parameters 
7 and (T^. We set A to zero in the Algorithm 1, and compute a estimated tensor 
B=[a;X,Y,Z]n from the tensor data A. The parameter cr^ is estimated by using 
the variance of all the entries in the difference A — B, and the parameter 7 is set 
as 7 = 1 — max{|(Bi,Bj)||i ^ j}, where B^ is the i-th column in B = (X © Y) 0 Z. 
With regularization parameter A = | •\/2d^log(20(I^, the result of our algorithm is 
shown by using the simulated and real data in the next Section. 

6. Numerical Experiment. In this section, we have four types of numerical 
experiments for testing the performance of our algorithm. The codes of the hrst 
three experiments are written in Matlab with simulated data. In all the simulations, 
the initial guesses are randomly generated. The stopping criterion used in the all 
experiments depends on two parameters: one is the upper bound of the number 
of iteration iteration number (eg. iter_max= 10000 ), and the other is a tolerance 
to decide whether convergence has been reached (eg. conv_tol= e“^°). The fourth 
numerical experiment is executed in with OpenCV for surveillance video data. 
These experiments ran on a laptop computer with Intel i5 CPU 3.3GHz and 8 G 
memory. 

6.1. Estimated rank. We randomly create a tensor A € ^loxioxio 5 

rank-one components, and then use LRAT to estimate the rank of A along with the 
increment of the regularization parameter. The upper bound R of rank(7l) is fixed 
to 10 in the algorithm while the regularization parameter A varies from 0 to 0.1 by 
step 0.001. As shown in Figure 1, the estimated rank R has a decreasing trend as 
the parameter A increases for these particular random tensor examples. Heuristically, 
the reason for this trend lies is in the minimization the objective function in (3.3), an 
increase in A reduces the value of ||q;||i and thus the estimated rank R. 

6.2. Accuracy of the estimated rank. We randomly generate three kinds of 
tensors with various dimensions and various rank-one component numbers (cn). The 
estimated rank R is calculated with the regularization parameter A = ^ \/2d-^ log(200i?). 
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where < 7 ^ and 7 are computed as discussed in Section 5. Table 1 shows the mean and 
standard deviation of the estimated rank. 

Table 1 

Mean and standard deviation of the estimated rank R. 



I=J=K=5 

0 

II 

II 

II 

'—I 

0 

II 

II 

II 

cn = 2 

2.28 (0.87) 

3.25 (1.31) 

5.41 (1.85) 

cn = 3 

3.15 (0.93) 

4.49 (1.12) 

7.2 (2.06) 

cn = A 

3.6 (0.92) 

5.18 (1.16) 

8.35 (1.82) 

cn = 5 

* 

5.77 (1.29) 

9.98 (1.60) 

cn = 8 

* 

7.52 (1.01) 

10.88 (1.51) 

cn = 10 

* 


11.69 (1.50) 

cn = 15 

* 

* 

14.11 (1.43) 


For each component number an = 2,3,4, we randomly generate 100 tensors in 
]^ 5 x 5 x 5 f = J = K = 5, and then use the LRAT with the upper bound i? = 5 to 
compute the estimated rank R. As shown in Table 1, when the rank-one component 
number cn = 3, the average estimation difference of i? — cn is 0.15 and the standard 
deviation of R is 0.93. 

Similarly, for each component number cn = 2,3,4, 5, 8 , we randomly generate 
100 tensors in Rioxioxio £gj. _ 2,3,4,5,8,10,15, we randomly generate 100 
tensors in ]R 20 x 20 x 20 upper bound R is set to / = 10, 20. The mean and standard 
deviation of R are shown in the last two columns of Table 1. 

6.3. Comparison between LRAT and modALS. In this subsection, we show 
the comparison between LRAT and modALS [42] on a toy model. A tensor A in 
R^x 5 x 5 jg randomly generated with 3 rank-one components. Based on the residual 
function, the modALS algorithm approximates tensor A by a tensor of five rank-one 
components, while the LRAT algorithm solves (3.3) and obtain an estimate on the 
rank of tensor A. 




(a) Residual of LRAT and modALS (b) Objective function of LRAT 

Fig. 2. Comparison between LRAT and modALS. 

Figure 2 (a) demonstrates the residual function ||A — [d; X, Y, Z]_r|||. for the 
modALS and LRAT algorithms. Compared to the LRAT, the modALS executed by 
using five rank-one components decreases monotonically and has a low misfit on the 
residual function. Figure 2 (b) shows the objective function jljA— [d; X, Y, -|- 
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A||q:||i in (3.3) for the LRAT algorithm. It decreases monotonically and provides an 
estimate on the number of rank-one components as shown in Section 6.2, though the 
LRAT gave one less digit accuracy in the residual. 

6.4. Application in surveillance video. Grayscale video data is a natural 
candidate for third-order tensors. Due to the correlation between subsequent frames of 
video, there exists some potential low-rank mechanism in the data. In this subsection, 
we apply the LRAT and the modALS to two surveillance videos^ on Fountain and 
Lobby. For each video of 220 consecutive frames, we choose a region of interest with 
a resolution 30 x 30. 



(a) 20 frames for Fountain (b) Results from LRAT (c) Results from modALS 



(d) 20 frames for Lobby (e) Results from LRAT (f) Results from modALS 


Fig. 3. Computation results based on LRAT and modALS. 

Figure 3 demonstrates simulation results on the LRAT and the modALS. Here 
the upper bound R is fixed to 400. Figure 3 shows 20 frames in the original video data 
A and those frames estimated by the LRAT and the modALS. The modALS provides 
an approximation with three factor matrices of (30 + 30-1-220) x 400 elements. For the 
LRAT algorithm, the regularization parameter A is set to log(200i?), where 

and 7 are computed as discussed in Section 5. The estimated number of rank-one 
components in B is 378 for the Fountain video. The representation of B with three 
factor matrices only needs (30 + 30 + 220) x 378 elements. The estimated number of 
rank-one components is 392 for the Lobby video, and the representation with three 
factor matrices needs (30 + 30 + 220) x 392 elements. Compared to the modALS 
algorithm, the LRAT has a smaller estimated rank but it sacrifices more cpu time, 
because the LRAT algorithm requires an instructive (a starter) choice on A. For this 
case, we used the modALS algorithm to obtain a starter choice A for LRAT. 

7. Conclusion and future work. We propose an algorithm based on the prox¬ 
imal alternating minimization to detect the rank of tensors. This algorithm comes 


^The original data is from http://perception.i2r.a-star.edu.sg/bk_model/bk_index.html 
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from the understanding of the low-rank approximation of tensors from sparse opti¬ 
mization. We also provide some theoretical guarantees on the convergence of this 
algorithm and a probabilistic consistency of the approximation result. Moreover, we 
suggest a way to choose a regularization parameter for practical computation. The 
simulation studies suggested that our algorithm can be used to detect the number of 
rank-one components in tensors. 

The work presented in this paper have potential applications and extensions, 
especially in video processing and latent component number estimation. The ongoing 
work is to apply this low-rank approximation method to moving object detection and 
video data compression. 
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